Round-trip testing POC by Nikil-Shyamsunder · Pull Request #180 · cucapra/protocols

Nikil-Shyamsunder · 2026-02-13T05:03:59Z

This isn't integrated with snapshot testing or CI yet. The obvious ways I could think of to integrate this with Turnt seemed clunky or required more complex changes, so for now I handrolled much of the logic to collect the tests and parse the arguments (which makes it kinda janky). I don't expect this stuff to go into main branch via this PR, but if thisis valuable we can figure out how to do that properly later.

After implementing this, I found a few potential "bugs" in the interpreter+monitor.

How the round-trip test works:

For each .tx file where the interpreter succeeds:

Run the interpreter to generate an FST waveform
Run the monitor on that FST with the same .prot file
Check if the monitor succeeds
TODO: actually compare against the monitors output with the interpreters .tx. We could do this now, but there are slight differences in the formatting that maybe we want to consider changing first?

The script is at scripts/roundtrip.py and the generated output is in scripts/rountrip.out

Current output:

=== Round-trip results ===
  Passed:  28 / 33
  Failed:  5 / 33
  Skipped: 37 (transactions don't complete successfully)

Monitor failures (all are expected!):

  --- protocols/tests/adders/adder_d1/busy_wait_pass.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

  --- protocols/tests/adders/adder_d1/loop_with_assigns.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

  --- protocols/tests/adders/adder_d1/nested_busy_wait.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

  --- protocols/tests/fifo/push_pop_loop_empty.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

  --- protocols/tests/fifo/push_pop_loop_not_empty.tx ---
  thread 'main' panicked at monitor/src/interpreter.rs:466:17:
  not yet implemented: Bounded loops is not yet implemented in the monitor

Monitor bugs found and fixed (kinda?)

1. FST files with no time entries crash `fst-reader` (combinational-only designs)

Affected tests: add_combinational.tx, passthrough_combdep.tx

Root cause: Designs like add_d0 are purely combinational. When the interpreter runs a single cycle on them, the generated FST has no time entries. The fst-reader crate then panics at time_chain[0] (index out of bounds on an empty vec).

Quick Fix: Added an extra sim_step() at the end of execute_todos() in protocols/src/scheduler.rs so the FST always has at least one time entry. This empty cycle, from what I can tell, doesn't break anything for the monitor?

2. Monitor panics when a protocol argument is never mapped to a pin

Affected tests: add_combinational.tx (protocol add_combinational_illegal_observation_in_conditional has in b: u32 but does DUT.b := X, so b is never mapped to a trace value).

There are a few things going on with this test. One is that I believe that the add_combinational_illegal_observation_in_conditional is actually being picked up by the monitor as a valid trace, despite it being illegal from the perspective of the interpreter. If it was noted to be illegal, we wouldnt get the following downstream error:

Root cause: to_protocol_application in monitor/src/interpreter.rs did unwrap_or_else(|| panic!(...)) when looking up an argument in args_mapping.

Quick Fix: missing args are serialized as "?" instead of panicking. This could also be serialized as "X"? A protocol might have an argument it never uses, and that shouldn't be an error, I think. Let me know if I am wrong. This failure in general definitely requires greater investigation. Regardless of the true upstream fix for the add_combinational test, I think it is reasonable for people to write valid protocols with an unused arg, and the monitor might deal with that more gracefully than it does now, unless unknown params cause other issues in monitor tractability..

3. Monitor kills scheduler when a finished thread has slower siblings still running

Affected tests: both_threads_pass.tx.

Root cause: validate_finished_and_failed_threads in monitor/src/scheduler.rs returned an error if a thread finished but sibling threads from the same start cycle were still in the next queue. This is wrong when protocols have different lengths (e.g., add finishes in 2 cycles but wait_and_add takes longer). Ernest's meta scheduling thing is able to handle keeping the other "slower" trace around, but the current monitor logic was erroring instead. So, I just deleted that block of code. Now, both traces become valid.

Quick Fix: Instead of returning SchedulerError::NoTransactionsMatch, move the slower sibling threads from next to failed.

4. Empty blocks in monitor cause premature exits

Affected tests: passthrough_combdep.tx.

Root cause: In monitor/src/interpreter.rs, evaluate_stmt for Stmt::Block with an empty body returned Ok(None) (signaling "thread is done"), but it should have returned Ok(self.next_stmt_map[stmt_id]) to continue to the next statement in the parent scope. An empty if branch like if (cond) { } else { } would cause the thread to terminate early.

Actual Fix: Changed empty Block handling to use next_stmt_map instead of returning Ok(None). This reflects the interpreter logic. I think this was a bug in the interpreter that was only recently discovered a few months ago and patched by me, which is probably why the monitor logic is off.

5. Protocols that don't end with `step()` should be discarded

Affected tests: both_threads_pass.tx (protocol add_doesnt_end_in_step ends with fork() instead of step()).

Root cause: When a thread finishes execution (Ok(None) in run_thread_till_next_step), the monitor unconditionally added it to the finished queue. Ill-formed protocols that don't end with step() would get treated as successful matches.

Quick Fix: Added a check in run_thread_till_next_step: if the last executed statement isn't Stmt::Step, the thread is moved to failed instead of finished.

A weird side effect of this was that this rule also affects the stall protocol in the AXI stream tests. The stall protocol has assertions after its step():

prot stall<DUT: AXISManager>(out data: u32, out last: u1) {
    ...
    step();
    assert_eq(DUT.i_tdata, data);   // post-step assertion
    assert_eq(DUT.i_tlast, last);   // post-step assertion
}

I am sure these tests were written as such for a reason, but I'm confused as to why they yield valid results if the Protocols are not well-formed.

protocols/src/scheduler.rs

ekiwi · 2026-02-13T16:23:46Z

monitor/src/scheduler.rs

-                    "Thread {} (`{}`) finished but there are other threads with the same start time ({}) in the `next` queue, namely {:?}",
+            // ...any other threads from the same start cycle still in `next`
+            // are slower siblings that lost the race — move them to `failed`
+            let sibling_names: Vec<String> = self


With #174 we now want to allow for different matching traces. Instead of cutting them of, we report all of them, which is why you can see things like Trace 0 and Trace 1 in the output.
We probably need to think of a better way to integrate multiple traces in the output format so that it can remain compatible with the interpreter. The easiest would probably be to define a keyword that signals the start of a new trace, so in the interpreter, you could then reset everything and execute the new trace.

Agreed, though I should point out the tests that led me to this change were failing to produce any trace in the monitor. I think this was just a small thing that wasn't removed in #174. I've removed my logic as well as the old monitor logic and tested; I checked that the monitor is able to produce multiple traces now for this test.

ekiwi · 2026-02-13T16:24:38Z

monitor/tests/fpga-debugging/axi-stream-s2/s2_fixed.out

-stall(7, 0)  // [time: 1012.5ns -> 1037.5ns] (thread 46)
-reset()  // [time: 1062.5ns -> 1087.5ns] (thread 48)
-reset()  // [time: 1087.5ns -> 1100ns] (thread 49)
-Trace 2:


Here you can see how your change means that the monitor only outputs a single trace instead of three different ones. However, we do want the monitor to output 3 traces for this example.

Getting rid of the change in the above comment got us back up to two traces. The problem with getting the third trace is that the stall protocol didn't end in step(), so traces with it were getting thrown out for well-formedness reasons. We need that check for interpreter-monitor parity, or we get rid of the check, update well-formedness checks, and modify the interpreter to handle these.

I'm noticing a few of the monitor protocols, particularly for stalls, dont end in fork(); step() and just end with assertions. I'm wondering if this is an intentional exception to the well-formedness checks? when I add a fork(); step(); back in, I'm losing one of the three traces from the output. The reason is that the stall is now treated as a 2-cycle protocol, so the '1 cycle stall->1 cycle stall' trace disappears and so we go from 3 possible traces to just 2...

if the intention is to relax the well-formedness checks to allow only assertions after the last step(), we can do that

I told Nikil this was probably a mistake on my part when I hand-wrote the protocols since I didn't run them through the interpreter -- I will investigate this!

ekiwi · 2026-02-13T16:25:03Z

scripts/roundtrip.out

@@ -0,0 +1,32 @@
+
+=== Round-trip results ===


Should this be committed to the repo? Or do you want to add the filename to the .gitignore.

ekiwi · 2026-02-13T16:26:40Z

scripts/roundtrip.py

+                monitor_cmd, shell=True, cwd=base_dir,
+                capture_output=True, text=True,
+            )
+            if result.returncode == 0:


Do you plan on comparing the content of the .tx file that the monitor produces to the original .tx file?

…s, add fork() step() to end of a protocol

ngernest · 2026-02-14T19:46:06Z

As a heads-up, I created #181 to help make the round-trip tests easier, I would recommend merging #181 before this one!

Nikil-Shyamsunder changed the title ~~Round-trip testing~~ Round-trip testing POC Feb 13, 2026

ekiwi requested changes Feb 13, 2026

View reviewed changes

Nikil-Shyamsunder requested a review from ngernest February 14, 2026 01:45

Nikil-Shyamsunder added 4 commits February 13, 2026 18:37

round-trip testing + interp and monitor changes to reconcile the two

0d4310b

remove justfile comment

e02ebad

remove monitor logic to kill transactions of different finishing time…

ea2618e

…s, add fork() step() to end of a protocol

add new behavior for s2_fixed to snapshot tests

d4c98e0

Nikil-Shyamsunder force-pushed the round-trip-testing branch from 6538a59 to d4c98e0 Compare February 14, 2026 02:38

This was referenced Feb 14, 2026

[Monitor] Chore: Suppress thread IDs in output by default + use the same syntax as .tx files when printing traces #181

Merged

[Monitor] Double-check well-formedness of protocols for Brave New World bugs by running them through interpreter #185

Open

merge main

c269dc0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Round-trip testing POC#180

Round-trip testing POC#180
Nikil-Shyamsunder wants to merge 5 commits intomainfrom
round-trip-testing

Nikil-Shyamsunder commented Feb 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

ekiwi Feb 13, 2026

Uh oh!

Nikil-Shyamsunder Feb 14, 2026 •

edited

Loading

Uh oh!

ekiwi Feb 13, 2026

Uh oh!

Nikil-Shyamsunder Feb 14, 2026 •

edited

Loading

Uh oh!

ngernest Feb 14, 2026

Uh oh!

ekiwi Feb 13, 2026

Uh oh!

ekiwi Feb 13, 2026

Uh oh!

ngernest commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

Nikil-Shyamsunder commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Monitor bugs found and fixed (kinda?)

1. FST files with no time entries crash fst-reader (combinational-only designs)

2. Monitor panics when a protocol argument is never mapped to a pin

3. Monitor kills scheduler when a finished thread has slower siblings still running

4. Empty blocks in monitor cause premature exits

5. Protocols that don't end with step() should be discarded

Uh oh!

Uh oh!

ekiwi Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Nikil-Shyamsunder Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ekiwi Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Nikil-Shyamsunder Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngernest Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

ekiwi Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

ekiwi Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

ngernest commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Nikil-Shyamsunder commented Feb 13, 2026 •

edited

Loading

1. FST files with no time entries crash `fst-reader` (combinational-only designs)

5. Protocols that don't end with `step()` should be discarded

Nikil-Shyamsunder Feb 14, 2026 •

edited

Loading

Nikil-Shyamsunder Feb 14, 2026 •

edited

Loading